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EVALUATION 


The  purpose  of  this  effort  was  to  test  and  evaluate  the 
LOGOSCAN  II  Optical  Character  Reader  System  to  assess 
its  potential  for  conversion  of  free  formatted  multifont 
typeset  Russian  text  to  a  computer  processable  format 
compatible  with  the  Russian-Engl i sh  machine  translation 
system  at  Foreign  Technology  Division  (FTD). 

LOGOSCAN  II  scanned  27  pages  of  Russian  text  supplied  by 
FTD.  The  system  scanned  the  text  at  approximately  JO 
characters  per  second  with  an  error  rate  of  1.0  -  2.0%. 

The  resultant  study  demonstrated  that  the  scanning  of 
Cyrillic  text  is  feasible,  and  that  the  difficulties 
encountered  in  scanning  are  due  to  difficulties  inherent 
in  Cyrillic  text. 

LOGOSCAN  II  is  currently  not  a  production  system.  Further 
work  in  the  areas  of  scanning  speed,  recognition  accuracy 
and  post  editor  programs  to  correct  errors  is  needed  for 
such  a  system. 
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A.  INTRODUCTION 

The  optical  scanning  of  typeset  Cyrillic  text  is  perhaps  the 
ultimate  test  of  an  OCR  system’s  recognition  algorithm.  The 
problem  divides  logically  into  four  parts: 

1.  recognizing  the  character  boundaries; 

2.  differentiating  the  character  from  all  others  in  a 
reference  alphabet; 

3.  tracking  a  skewed  line,  and  recognizing  line  boundaries; 

4.  selecting  the  correct  reference  alphabet  from  a  set  of 
possible  reference  alphabets. 

The  Logoscan  II  System  solves  all  four  of  these  problems. 


B .  SCOPE  OF  THIS  STUDY 

The  Statement  of  Work  for  this  contract  states  its  objective  to 
be  the  "testing  and  evaluation  of  the  Logoscan  II  Optical  Reader 
(OCR)  System  to  assess  its  potential  for  conversion  of  free- 
formatted  multifont  typeset  Russian  text  to  a  computer  process- 
able  format  compatible  with  the  Russian-English  machine  trans¬ 
lation  at  Foreign  Technology  Division  (FTD)." 

Twenty  pages  of  the  book: 

floxnanba 

AxafleMKH  Hayx  CCCP 

were  selected  as  representative  of  the  general  problem  and  are 
the  subject  of  the  present  report.  The  book  contains  five 
separate  fonts,  viz.,  three  title  fonts,  one  main  corpus  font, 
and  a  bibliographical  font.  Due  to  the  limitations  of  time  and 
funding,  Logoscan  II  has  been  optimized  on  the  main  corpus  font 
only.  Although  all  five  fonts  were  placed  in  its  memory  and 
were  used  during  the  scanning  operation,  no  effort  has  been  made 
to  increase  the  accuracy  rate  of  these  other  four  fonts.  They 
can,  of  course,  be  brought  to  the  same  level  of  accuracy  as  the 
main  corpus  font  at  a  later  time.  There  is  no  inherent  problem 
in  the  fonts  themselves,  nor  in  Logoscan  II' s  ability  to 
handle  them.  The  primary  aim  has  been  to  demonstrate  a  capa¬ 
bility  to  scan  such  text  with  an  accuracy  rate  and  speed  which 
would  be  significantly  more  economical  than  current  manual 
techniques . 
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C.  EQUIPMENT  USED  IN  THIS  STUDY 

Many  off-the-shelf  OCR  units  were  available;  however,  the  ECRM 
4500  proved  to  have  the  best  combination  of  light  source  (a 
laser)  and  paper  moving  mechanism.  A  Data  General  S130  Eclipse 
was  selected  as  the  computer  which  processes  the  scanner’s 
raw  video  data.  The  selection  criteria  in  this  case  were: 

(1)  compatibility  with  existing  Logoscan  I  programs;  and 

(2)  a  micro-code  language  which  allowed  Logos  engineers  to 
optimize  certain  high  frequency  program  loops. 

Logos  engineers  supervised  the  building  and  testing  of  a 
special  purpose  interface  to  control  the  functions  of  the 
4500,  which  bypassed  the  4500's  internal  PDP8  computer.  This 
interface  used  a  DMA  channel  for  rapid  transfer  of  information 
from  the  ECRM  to  the  DG. 

The  functions  of  the  ECRM  are  all  controlled  by  the  S130 
using  an  interrupt  driven  operating  system. 

Appendix  D  contains  a  detailed  list  of  the  equipment  used 
during  this  study. 


D.  TECHNICAL  PROBLEMS 


The  four  problem  areas  stated  above  are  detailed  here: 


recognition  of  the  character  boundaries: 


Since  this  book  contains  proportional  spaced  characters  which 
are  for  the  most  part  seraphed  characters,  a  scanner  operating 
at  a  4  mil  resolution  frequently  encounters  touching  characters. 
This  would  be  true  even  at  a  1  mil  resolution,  due  to  quality 
of  certain  sections  of  the  typeset  page  (Page  Bl) .  The 
Logoscan  II  algorithm  has  been  designed  to  search  for  clues  as 
to  the  location  of  a  boundary  between  touching  characters  even 
when  they  are  proportionately  spaced. 


The  upper  and  lower  limits  of  a  character,  particularly  when 
there  are  many  ascenders  and  descenders  (Page  B4)  are  difficult 
to  find.  Line  skewing  compounds  this  problem.  Once  again,  the 
algorithm  because  of  its  design  can  track  a  line  and  eliminate 
ascending  or  descending  characters  from  contiguous  lines. 


It  is  this  ability  to  define  character  boundaries  in  a  propor¬ 
tionally  spaced  line,  in  the  presence  of  noise,  in  a  tight  line 
spacing,  and  for  in  highly  seraphed  fonts  that  makes  Logoscan  II 
a  fourth  generation  OCR  device.  The  second  characteristic  which 
identifies  the  system  as  a  new  generation  of  OCR  is  its  signature 
algorithm,  viz.,  standard  masking  techniques  are  not  used. 
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Characters  such  as  the  Russian  H  can  be  broken,  that  is,  the  two 
solid  vertical  strokes  can  appear  to  be  two  separate  characters 
if  the  horizontal  stroke  is  weak  or  missing.  Logoscan  II  uses  a 
hardware/software  combination  to  solve  this  problem. 

If  the  horizontal  stroke  is  merely  weak,  good  resolution  will 
enable  the  system  to  see  a  stroke  that  might  be  missed  with  a 
less  sensitive  read  head.  To  this  end,  Logos  engineers  modified 
the  paper  moving  subsys.  em,  i.e.  ,  the  ECRM  now  has  an  effective 
resolution  of  3  mils  along  the  vertical  axis  of  the  page. 

If  the  horizontal  stroke  is  missing,  however,  Logo scan  II  relies 
on  rules  to  examine  areas  around  what  looks  like  a  piece  of  a 
character.  If  the  system  discovers  a  shape  that,  when  added  to 
the  first  piece,  could  be  defined  as  a  character,  it  will  merge 
them  and  recognize  a  single  character. 


2 .  differentiating  the  character  from  all  others  in  a  reference 
alphabet : 

The  main  corpus  font  contains  unique  characters,  some  of  which 
look  very  similar  to  an  OCR  device,  e.g.,  a  Russian  H  and  H. 

The  method  of  signature  analysis  used  by  Logoscan  II  enables  it 
to  correctly  differentiate  such  characters.  The  same  character, 
such  as  a  Russian  H,  when  examined  at  different  times  during  the 
scanning  process,  can  vary  (Page  B7) .  The  OCR  must  recognize 
every  occurrence  of  this  character  correctly,  even  if  it  appears 
quite  differently  each  time. 

Characters  can  also  vary  from  their  normal  appearance.  This 
could  be  caused  by  pieces  missing  (usually  seraphs)  or  noise 
that  disguises  the  character’s  shape.  The  method  of  signature 
analysis  used  enables  the  system  to  differentiate  between 
similar  (but  unique)  characters,  yet  accurately  identify 
characters,  even  if  they  vary  in  appearance. 

Appendix  C  illustrates  the  system's  processing  of  sub-  and 
super-scripts . 


3 .  line  skewing  and  recognizing  line  boundaries: 

Most  OCR  units  introduce  dynamic  and  static  line  skewing.  This 
is  caused  by  the  paper  transport  system,  or  by  the  original 
positioning  of  text  on  the  page.  In  addition,  ascending  and 
descending  characters  make  the  job  of  defining  the  boundaries 
of  a  line  much  more  difficult  (Page  B4). 


Logoscan  II  will  read  text  that  is  dense,  i.e.,  greater  than  8 
lines  to  the  inch,  with  ascenders  and  descenders,  and  can  track 
a  line  skewed  by  as  much  as  one-half  a  line's  height. 


4 .  selecting  the  correct  reference  alphabet  from  a  set  of 
reference  alphabets: 

The  five  fonts  noted  above  can  appear  on  any  page.  The  system 
tests  a  new  line  (if  it  was  preceded  by  a  blank  line)  against 
each  font  and  selects  the  one  having  the  lowest  error  count. 
Logoscan  II  can  store  in  its  main  memory  as  many  as  ten  such 
reference  fonts.  If  more  were  needed,  its  secondary  disk 
memory  can  be  used  and  it  could  call  in  any  number  of 
secondary  fonts. 

Although  the  solutions  to  the  above  problems  have  been  incor¬ 
porated  into  Logoscan  II  software,  there  remains  a  set  of 
problems  related  to  the  quality  and  content  of  the  typeset  page. 


o  Background  Noise 

As  figure  4,  Page  B2,  illustrates,  the  reverse  side  of  a  page 
shows  through  to  the  side  being  scanned.  This  background 
noise  is  general  throughout  the  book  used  during  this  study. 
It  cannot  be  eliminated,  and  its  presence  causes  more  than 
half  of  the  errors  experienced  on  any  given  page. 

The  thickness  and  quality  of  paper  used  in  a  book  govern  the 
degree  of  background  noise.  As  the  need  and  desirability  to 
scan  typeset  material  becomes  more  widely  recognized,  the 
problem  of  background  noise  could  be  largely  eliminated  by 
the  proper  selection  of  paper. 


o  Foreground  Noise 

Dirt,  ink  spots,  and  poor  quality  paper  all  introduce  distor¬ 
tions  to  the  character  as  seen  by  tne  laser  light  source. 
Figure  3,  Page  B2.0,  shows  the  sort  of  foreground  noise 
experienced  in  this  book. 


o  Identical  Characters 


A  Russian  H  and  an  English  H  are  the  same  character  in  the 
main  corpus  font.  Only  a  post-processor  program  examining 
the  character  in  context  can  differentiate  such  pairs. 

Page  A4  lists  the  combinations  appearing  in  this  book. 
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E .  GENERAL  METHODOLOGY 


After  interfacing  the  ECRM/S130  system,  Logos  personnel 
concentrated  on  two  main  areas: 

1.  Recognition  Speed 

This  activity  required  analysis  of  the  Logoscan  algorithms  and 
supporting  programs  searching  for  high  frequency  loops  that 
could  be  converted  to  S130  micro-code.  Writing  and  debugging 
this  code  paralleled  the  second  effort. 

2 .  Recognition  Accuracy 

The  five  fonts  were  acquired  and  repeated  testing  of  the  main 
corpus  font  resulted  in  a  fine  tuned  set  of  character 
signatures  for  this  font.  No  such  fine  tuning  was  attempted  on 
the  other  four  fonts,  nor  were  the  Greek  letters  in  the 
main  corpus  font  optimized. 


F.  TECHNICAL  RESULTS 

The  recognition  speed  on  any  given  line  is  now  30  characters  per 
second.  The  average  recognition  speed  for  a  set  of  lines  is  25 
characters  per  second.  This  latter  rate  can  be  improved  by 
foreground/background  tasking  of  data  acquisition  and 
recognition  at  some  later  date. 

The  recognition  accuracy  for  the  main  corpus  font  is  now  98.6%. 

A  high  percentage  of  errors  can  be  traced  to  the  source  docu¬ 
ment;  although  Logoscan  II  is  designed  to  allow  for  wide  varia¬ 
tions  within  a  character  type,  in  this  quality  document, 
characters  are  sometimes  so  distorted  as  to  be  recognizable 
only  in  context. 


G.  IMPLICATIONS  FOR  FURTHER  RESEARCH 


The  Logoscan  II  System  is  not  now  an  operational  system  in  a 
production  environment.  It  could  be  made  cost/effective  for  a 
number  of  Russian  fonts  with  approximately  three  months  of 
further  effort,  and  it  would  be  able  to  input  any  Russian  font 
at  a  fraction  of  current  keyboarding  cost  with  nine  months  of 
further  effort. 


H.  A  PRODUCTION  SYSTEM 

Certain  facts  about  Logoscan  II  in  a  production  environment  can 
be  stated,  and  estimates  of  others  are  given  below. 
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The  system  can  read  original  pages  or  Xerox  copies.  Either  way 
the  preparatory  work  is  about  the  same,  viz.: 


1.  mark  graphics  as  shown  in  Figure  B5. 

2.  align  one  edge  of  paper  at  right  angles  to  a  line  on  the 
page.  This  would  not  be  required  of  a  book  typeset  in 
this  country,  but  it  is  required  for  the  book  used  in 
this  test. 

3:  feed  the  pages  50  at  a  time  into  the  scanner's  paper 

feed  tray. 

The  preparatory  work  for  50  pages  is  approximately  one  man-hour. 


Reject  Rate 


Rejection  rate  corresponds  to  approximately  .9  times  the  error 
rate.  That  is,  9  out  of  10  errors  can  be  flagged  as  errors. 

When  Logoscan  II  is  completed  the  estimated  error  rate  for  all 
fonts  will  be  between  1%  and  2%  for  Cyrillic  fonts.  This  spread 
is  a  function  of  the  quality  of  the  original  document.  It  would 
be  1%  for  the  book  used  in  this  study.  Under  these  conditions 
there  would  be  22  flagged  errors  on  a  page,  and  given  a  good 
CRT  editing  system,  they  could  be  converted  to  correct 
characters  in  .041  man-hours. 


Areas  for  Improvement 

To  achieve  a  full  scale  production  system,  certain  new  opera¬ 
tional  features  would  be  required: 

1.  Additional  special  tests  for  problem  characters. 

2.  A  second  ECRM  unit  interfaced  to  the  S130.  This  would 
allow  parallel  processing  of  two  separate  books. 

3.  Post-editor  programs  to  correct  errors  automatically  by 
examining  them  in  context. 

4.  A  well  thought  out  CRT  editing  system  for  Russian  text. 

As  noted,  about  nine  months  of  effort  would  achieve  these  goals. 


Resultant  Cost 


Such  a  complete  production  system  could  be  achieved  in  nine 
months  for  approximately  $310, 000 , including  all  hardware, 
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software  and  manpower  costs. 


Comparative  Cost 


Without  current  keyboarding  cost  accounting  figures,  no  compari¬ 
son  can  be  made  of  automated  vs.  manual  techniques.  However,  an 
estimate  can  be  made  of  the  cost  for  a  2500  character  page  using 
the  full  scale  production  system.  The  cost  to  achieve  an 
edited  (therefore  nearly  error-free)  2500  character  page  are: 


1.  System  Costs: 

Since  the  equipment  will  be  user  property  after  nine  months, 
only  monthly  maintenance  fees  and  power  requirements  are 
used  as  the  basis  for  this  estimate.  In  a  one-shift,  two-man 
operation  approximately  800  such  pages  could  be  processed 
per  day  to  the  point  where  editors  could  begin  correcting 
rejects.  It  would  require  4.1  editing  operators  to  correct 
rejects  for  these  800  pages.  The  system  cost  would  be 
$.036  per  page. 


2 .  Manpower  Costs: 

As  stated  above,  these  800  pages  would  require  two 
operators  and  4.1  editors  per  day,  or  .061  man-hour s/page . 
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Appendix  A 

Sample  page  from  "  flOKJIAflH" 

Output  from  Logoscan  II 

Russian-English  Alphabet  Map 

used  by  Logoscan  II 


A o k .i  a  h u  A  naaeMHH  ii n y  k  C C C P 
1976.  Tom  230,  A*  6 


y^K  547.963.3 


0J13IiyECKAfl  XII  MM  a 


•I.ieii-KO|i|icfiiiii!,lciiT  AU  CCCP  JJ.  I\  K1IOPPE, 

c.  r.  nonoB,  t.  a.  'iiimutoba 

KHHETHHECKIIE  OCOEEllllOCTIl  AOHUHOn  MOAIKMIKAUIlll 
DlIOnOJIHMEPOB  AJIH  PEAh'UHfl,  IIPOTEKAIOIAHX  C  y^ACTHEM 
AKTHBHLIX  IlPOMEvKyTOHHblX  HACTI1A 


KiiHeTUHecHue  3aKonoMepnocTn  acjninHoii  MOfl0cf)unaiiiin  b  liacronmee  Bpe- 
mh  npoaHaa»30pOBaHM  a-ih  cnyian,  uorAa  pearenT  upeTcpncBacT  npeBpaineiino 
to.ibko  Haxoflncb  b  KOMii.icKce  c  JioMJKjinunpycMUM  6jmoniiMcpoM  3a  cneT  cy- 
mecTBeHHoro  BoapacTauuH  KoncTauTbi  cnopocTii  BaaiiMOAencTniin  poampyio- 
mea  rpyunu  c  MOAH4>uuupyeMoii  ipynnoii  Ciiono.uiMopa  b  pcsy.ibTUTe  hx  npo- 
CTpaHCTBeuHoro  c6.ni/KeHHn.  B  btom  caynae  cxoxja  npoBpaiueiuin  ;ianncbinaeT- 
cn  b Bane 

E+X^EX-EZ, 

tjijii  E  —  MOABiImuiipyeMbm  onono.niMcp,  X  —  acjnnuibiii  pcaioiiT,  EX  -  komh- 
Jienc  6Houo.nuMcp  —  peareuT,  EZ  —  npoAN'KT  MOAinJumanun.  Flpn  aoctutohhom 
h36mtkc  peareina  KHUCTima  pcaKHim  omicuBacTCH  KiintTH'iecKiiM  ypaBiieHn- 
eM  fljin  peanmiH  nepaoro  liopnAKa  no  CnonojniMepy  c  navKymencn  KoncTaHToii 
ch'opocTii,  3aBiicnmeu  ot  KoimeuTpaiuin  pearcma  (',  2).  B  paooTe  (3)  pcuieHHe 
pacupocTpaueno  na  c.iynaii  oopnniMoii  Mo.imlniKaunn,  na  c.iynaii,  Korja  b  npo- 
nocce  MOAiKjniKaunii  Guoiio.niMop  npo\o;niT  Hopes  hccko.ii.ko  lipoMonyToniibix 
cocTOHnnii,  ii  npcA.iO/iicii  iipnonii/KemiMii  motoa  penicillin  ,vih  c.iynan,  norfla 
KonucHTpamin  Gnono.in.Mepa  ne.niHiiiia  roro  >kc  uopiiAHa,  hto  n  KcmueiiTpamin 
peareina. 

B  nacTonnicii  paooTe  paccMarpnnaeicn  KinicTnua  a^minoii  MOAinJniKannii 
Ann  cjiynan,  Kor^a  npcBpamenae  pearcHTa  npoxofliiT  icpca  DpoMC/nyTonHoe 
o6pn30Banne  aiiTminux  nacTim,  npnneM  oOpaaoBaune  3thx  nacTim  hb.ihctch 
.iiiMinnpjTonicii  CTaAiieii  n  b  nepBOM  npn6.ni/KeHtin  nc  3aBUCUT  ot  npucyTCTBiin 
MO,n!<j>nuitpycMoro  Cnono.mMcpa.  C  TaunMii  c-iynanMii  npnxonnTc.n  cTa.iKiinaTh- 
cn  ii])ii  (JioToacJniimoii  moawJ111  nautili  Giioiio.inMepoB,  peareiiTaMii.  coflcp>Ka- 
tUBMB  a3HAHyio  rpynny,  Korpa  .HiMBTiipyiomeft  CTaAneii  hb.ihctch  npcnpame- 
nnc  a3nfliiofi  rpynnw  n  GnpaAnna.i  niupcn  (*)  n  npii  MOAinJinnaumi  nponaBOA- 
ni.nni  apoMaTiiMocKiix  2-xnop3Tii.iaMniioH,  npn  KOTopoii  .HiMnnipyiomeii  CTa- 
jinefi  OKa3biBaeTcn  npcnpamcniie  2-.x.iopoTn.iflMiiHorpyiinbi  b  poaKniionnociio- 
coGnwii  3Tn.ncnnMMonneBbifi  kutiioii  (5).  B  3tom  cjiynac  o5])a3onnBinnccn 
aKTiiBnue  npoMewyTOHnue  nacmubi  MoryT  ;in6o  MOAinfrinHiponaTb  Gnono.HiMop, 
jiiiGo  pearnponaTb  c  Mo.ieKyjiaMn  pacTiiopiiTe.iH  n  ApyniMH  nn3KOMO.ioKy.iHp- 
HkiMU  KOMtioHeiiTaMn  pacTBOpa.  B  npiinunne  no  ncK.HOHena  noaMonniocTb  h 
Hccncnn^ni'iecKoii  MonmJniKannn  Gnono.iUMepa.  Ann  c.iv'ian  nijtinnioro  (komh- 
jieMcnTapno-anpecoBannoio)  a.TKn.iiipoBaiiiin  iiyK:u iuiobux  kiic.iot  upon.moA- 
nuMn  o.inroHVK.icoTnAOB,  hccvduimh  octhtok  apoMamnccKoro  2-x.iop3Tn.i- 
aMnna,  noi;a.iano,  hto  nccncuinJninecKan  Mo;im{)nKannn  bho  KOMn.ieKca 
nyK.icnnoBan  Kiic.iOTa  —  poareiiT  nponcxoAnT  b  i303HannTo.ibnon  CToneHH  ('). 
A-nn  4>OToaij)BHnHx  pearemoB  onncaiibi  caynau  siiamiTe.n.noft  iiccncnn^niniioH 
MOAn$nKaunn  GnonoAHinepa  (7). 


A1 


Dokl  a  d  y  Akade  »  ii  nauk  SSSR 

_  1976,  Tem  230*  ♦  6 


UDK  547.96Z.Z 

C le  nTk orre e  p e u d  e u  t  AN  SS6R  "D7  GT  KP0RRE7 
S.  N  RI0P0R,  T.  A.  CIMI7*RA 

KINETICESKIE  OSOVENNOSTN  AFINN01  MODIFNKXQII  ~ 

(GO*  S  *CXSTIEM 
XKTPNN6P*  PR0MEJUT0CN6P*CXSTIQ 


Kineticeskiv  zakonomernosti  afinnol  modif ikaqi i  v  nasto45ee  vre- 
m4  proanal izirovany  dl4  sluca4,  ko5a  reagent  preterpevaet  prevra5enie 
tolSlCo  Waxod4s6  v  ^7b“mpYek s e  1  "modi F I cfi~ru e  m6*l "bipol imefom  za" scet  su- 
Sestvennogo  vozrastani4  konstanty  skorosti  vzaimodelstvi4  reagiruh- 
5el  gruppy  s  modif iqiruemol  gruppol  biopolimera  v  rezulbtate  ix  pro 
s' tr  dnst venndrb- sbli jeni 4T" V~ 3 to»~ slucae  sxema~ pr evfa5en i 4  zapisyvaet- 
s4  v  vide 

E+X=EX>EZ. 


gde  E  modif iqiruemyl  biopolimer,  X-  afinnyl  reagent,  EX-  komp- 
leks  biopol imer— reagent,  EZ  -produkt  modifikagii.  Pri  dostatocnom 
izbytke  reagenta  i  n  e  t  i  k.  a" r  eak qi i~"o pi syvdets  4~k.  i  h  e  t  i  c e sk7i »“  iirroviS*s ni- 
em  dl4  reakgii  pervogo  p<y4dka  po  biopolimeru  s  k.aju5els4  konstantol 
skorosti,  zavis43el  ot  konqentraqii  reagenta  (1,  2).  V  rabote  (3>  rewenie 
rasprostraneriO"  hd  slucalobrdtimol  mod  IT  lkaqfi  i ,"  nasi  iJcdT^kogda” v~  pro- 
qesse  modifikaqii  biopolimer  proxodit  cerez  neskol6ko  prone jutocnyx 
sosto4nil,  i  predlojen  pr ibl i jenny 1  metod  reweni4  dl4  sluca4,  kogda 
konqentra'qi*  bTopolimera- veTicind  togo  je  por4dka^~"ctb~ik“onqentrdqi  4  - 
reagenta. 

V  nasto45el  rabote  rassmatrivaets4  kinetika  afinnol  modifikaqii 
r  1 4  “s  1  uca 4 ,  "  kagd <T -pf eVr a5 e n i e'reage  n td”p fo  x  o  d it  cerez  prbmejutbcnoe 
obrazovanie  aktivnyx  castiq,  pricem  obra3ovanie  3tix  castiq  4vl4ets4 
limitiruhSel  stadiel  i  v  pervom  priblijenii  ne  zavisit  ot  prisutstvi4 
mod  i  f  i  q  i  ruemogb  biopolimera".  StaKpmisluca4miprixbdit~s4stalkivat6- 
s4  pri  fotoafinnol  modifikaqii  biopolimerov.  reagentami,  soderja- 
Simi  azidnuh  gruppu,  kogda  limitiruh5el  stadiel  4vllets4  prevra5e- 
xie  a3idnol  gruppy  v  biradikal  nitren  (41  “i~  pri  modif  rkaqlT " p> o i z vbd 
nymi  aromaticeskix  2-xlor3ti laminov,  pri  k.otorol  limitiruhSel  sta 
diel  ok.azyvaets4  prevraSenie  2xlor-3ti  laminogruppy  v  reakqi*nnospo- 
sobnyl  3  til  e  hi  mm  o  ni  ev  y  1  Ration  <"377  V  3  tom  sxucae  obrazbvavwies4 
aktivnye  prome jutocnye  cast iqymogutl ibomodif iqirovat6biopolimer , 
libo  reagirovatd  s  molekulami  rastvoritel4  i  drugimi  ni3komolekul4r- 
nymi  k om n d n e n t am T  ra s tvor a”.  V “ "p“ri nqip e"  ne  iskTficena  vozmo  jriosti  T 
nespeqif iceskol  modifikaqii  biopolimera.  D14  slGca4  afinnogo  < komp- 
lementarno-adresovannogo >  alkilirovzni4  nukleinovyx  kislot  proizvod- 
nymi “oligonukleogidov,  ~hesu5imi  bstdtdk"  aromaticeskogo  2xTor3til- 
amina,  pokazano,  cto  nespeqif icesk.a4  modifikaqi4  vne  kompleksa 
nukleinova4  k.islota  -reagent  proisxodit  v  nezxac itelAnol  stepeni  (6). 

I'14  fotoafinnyx- redgentov- opisdny  s4ucai  znac itelbnbl- 4es"neqif icnol 
modifikaqii  biopolimera  (7). 
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RUSSIAN-ENGLISH  ALPHABET  EQUIVALENTS 
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Russian 


Russian 
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d  X 
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Z  * 


English 


English 
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SCC(PT) 


Russian 


English 


Russian 


English 
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5 
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similar  shapes 
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1 
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* 

Cyrillic  and 

Roman  alphabets 

1 

Lower  Case 

Upper  Case 

Roman 

Cyrillic 

Roman 

Cyrillic 

:  M  y 

y  (u) 

A 

A  (A) 

«  e 

e  (e) 

B 

B  (V) 

r 

r  (g) 

C 

C  (S) 

r  1  a 

a  (a) 

E 

E  (E) 

l 

P 

P  (r) 

H 

H  (N) 

l  '■ 

o  (o) 

K 

K  (K) 

r  c 

c  (s) 

M 

M  (M) 

•  m 

<  X 

m  (m) 

0 

0  (0) 

x  (x) 

P 

P  (R) 

T 

T  (T) 

1 

Y 

Y  (U) 

X 

X  (X) 
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Appendix 
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A  close  study  of  the  Russian  word  in  Fig.  1  provides  a  good 
example  of  the  difficulties  in  scanning  this  type  of  document 
and  the  methods  employed  by  Logoscan  II  to  overcome  these 
difficulties . 


aAAYKT&i 

Figure  1 


Figure  2(A)  is  a  "normal"  character  as  seen  by  the  scanner,  i.e., 
positioned  correctly  in  relation  to  the  other  characters,  and 
not  touching  surrounding  characters.  The  next  three  characters, 
"JWy" ,  all  contain  descenders  and  all  bleed  into  each  other, 
Figure  2(B).  The  system  first  "pushes"  the  characters  up, 

Figure  2(C),  then  makes  a  first  effort  at  separation,  Figure 
2(D).  The  system  then  decides  that  this  shape,  Figure  2(D), 
is  still  two  characters  and  separates  them  further,  this  time 
recognizing  the  character  as  "H" ,  Figure  2(E). 


A  B 


C 


D  E 


F  G  H 

Figure  2 


The  procedure  is  repeated  to  define  the  next  "fl" ,  Figure 
2(F),  (G) ,  (H) .  Note  that  processing  is  restarted  from  the 

original  positioning  of  the  characters.  This  is  so  that  we  can 
accurately  track  a  line  of  text,  even  with  skewing. 


Bl 


When  scanning  typeset  material,  the  OCR  will  be  presented  with  a 
variety  of  noise  present  on  the  page.  This  can  be  caused  by  ink 
splattered  on  the  platen  (Fig.  3  to  left  of  characters)  or 
impressions  from  the  reverse  side  of  the  page  (Fig.  4) ,  or  chips 
and  pieces  of  type  suspended  between  characters  (Fig.  5) . 


-r/je  e0  —  r 

MHTei 
.y^eTOM  y 
nojiy^HTB 
Ay  e 

Figure  3 


Dtnajia  pacTBope: 
HECK)  IIOBepXHOd 
6ffH3Ka  K  BeJIH1 

— 0,15  b.  OflHai 
HOBaHHH  c  npHB 

Figure  4 


npii3HaTeJibiiocTb 


Figure  5 


For  the  above  cases  Logoscan  II  was  able  to  correctly  handle  the 
difficulties  (Figs.  6,  7,  8).  However,  in  certain  cases  the 
noise  was  too  severe,  and  the  character  was  distorted  (Fig.  9) . 
Characters  such  as  this  represent  50%  of  the  error  rate. 


„gde  vO  r»< 
Intern 
-ace  to  Hi  ur> 
poAuc 1 t6  v 
dl4  s 


o  :i.  a  la  rest  v  a  r  e 
n rh  poverxr.ost 
b  lie  k  a  k  v  e  1 
-0.15  v.  Odn 
d ovam 4  s  nr  iv 


Figure  6 


Figure  7 


V  zaklhceru  e  avtory  vyrajaht  prier.atel6r.ost6  H.  6.  Astafdevu 


Figure  8 


IHH  Xp 
H  XpO* 
IHHe 

Figure  9 


B2.1 


JIB30B 


Figure  10 


Characters  formed  by  hot  type  can  be  deformed  due  to  missing 
pieces.  While  this  usually  occurs  as  weak  strokes,  sometimes  a 
chipped  slug,  or  ink  missing  from  a  part  of  the  slug,  produces 
characters  such  as  in  Fig.  10. 


B3 


Ascending  and  descending  characters  are  major  problems  for  OCR 
devices,  particularly  at  7.7  lines  per  inch.  Difficult 
situations  such  as  the  one  pictures  in  Fig.  11  can  occur. 

The  sequence,  left  parenths ,  super  6,  right  parenths,  right 
parenths ,  is  dangerously  close  to  both  the  bottom  portion  of 
the  Cyrillic  "<J>"  on  the  line  above,  and  the  top  portion  of  the 
Cyrillic  "6"  on  the  line  below.  Logoscan  II  was  able  to  handle 
areas  like  this  because  it  can  "track"  a  line  of  text  and  make 
decisions  on  what  areas  to  examine. 


ItfJLUUVAIUlii)  «* 

I  3T0M  06p 
CVJIB$OKCHA< 

3P,  O).  H' 

5  He  6hjio  h 

rr  O  TT  TTimriTJ  Q  TXT 


Figure  11 


To  handle  graphics  Logoscan  II  requires  that  the  operator  draw  a 
line  to  the  left  of  the  graphics  area  (Fig.  12) .  The  system  will 
then  recognize  this  line  as  a  delimiter  and  go  to  the  next  line 
of  text.  In  the  output,  the  system  leaves  proportional  white 
space  for  insertion  of  graphics  later  (Fig.  13)  . 


II  \opdlllC.M  COUTHCTC'THIIII  f  ;pl  1 1 1 1  l.l  \l  II  ('I. 

Kjhimc  Toro  fl.TH  JKo.icna  ji  hhkc.tii  naoaio- 
710,4  ii  8T».r>,0  a«.  3tii  ana’ii'iiitn  Ech  roomcT- 
+3  h  +21 


U'llllO  (4). 
m  rymccT- 

>o  E, 

in  hum  ii  ii 
'  MCTil.T.lll- 

n.H*  Cr;0j. 
auillf  K3K 

'-’.I..  1 1 1  >11  HO- 
iu  -1 1 1  'I  II II  1,1 

>M  0.T00  111) 

2(i-  30  A 

IMl'ICC  Kiiii 
Hit  XpoMl*. 
Tlioii  TPM- 
aaoKTpo- 
Jl.'lH  OKIIC- 
iob  xpoMa 
Ta;ib  X13, 
Bcaeaiiux 
)r/Crj03> 
1  npncyT- 


Pnc.  1.  JIhhjih  2 p  aapiiTponoB  xpo¬ 
Ma.  1  —  XpOM.  OKHCJIOIIIIMII  Jia 

B03«yxe  lipu  2o°;  2  -  c.Ta.n,  X13, 
oKiic.ipiiHaa  Ha  noa^yxo  npii  25° 


xpoxia  na| 

•om  iio3Bo.nacT  BbiflmiHyTb  Moae.ii.  noiiepx- 
>3«yx  (nncaopoa)  n  cnjiaB  —  aaeKTpo.TiiT. 

TIOPTlH  1 M  *  m*»  Tmo,ir>'r'in  n  HOT  hotim* 


Figure  12 
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d4ts4  v  xorowem  sootvetstvii  sdannymi  (4). 

Krone  toga  d.14  jele3a  i  nikel4  na61h- 
710,4  i  855,6  3v.  3ti  3naceni4  E6v  sootvet- 
+  3  i  +2 
venno  ( 5  ) . 

oni.  su5est- 
k  o  E  s  p  d  1 4 
n  >2  4  n  a  »Ti  d  i 
t  ne  metal li - 
v  vide  S g  2  0 3 . 
adenie  kak 
Esv,  prive- 
ol  vel icing 
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non  sloe  na 
20-30  0 
etal 1 iceskil 
cto  na  Krone, 
natnol  tem- 
♦  31ektra- 
ki  d!4  ok is 
oriov  Krona 
~stal6  X13, 
edennux 
ie  S  g  j S  g  2 0 3 > 

1  prisut- 

roma  na 

n  posvo!4et  vyqvinut6  nodel6  poverx- 
ozdux  (kislorod)  i  splav  -31ektroJ it . 


Figure  13 
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^  B5.1 


CHbIX 


Figure  14 


The  sequence  "bl*"  (-Y*-  where  *  represents  any  character)  posed 

a  special  problem  for  Logoscan  II  if  the  character  following  the 
"bl"  bleeds  into  it  (Fig.  14)  .  Since  Logoscan  II  looks  for 
characters  when  presented  with  a  group  of  touching  characters, 
and  the  shape  "b"  is  a  legitimate  Cyrillic  character,  the  I 
portion  might  be  taken  as  part  of  the  next  character. 

This  situation  can  only  be  corrected  by  post-processing  using 
grammatical  rules,  which  Logos  plans  to  implement  at  a  later 
date. 


( 


The  printout  below  demonstrates  how  much  a  typeset  character  can 
vary.  Each  of  the  figures  (A,  B,  C)  is  an  image  of  what  the 
read  head  "sees."  The  system  has  to  identify  each  of  these  as 
the  same  character,  despite  the  varying  line  thicknesses. 


fl 


Logoscan  II  is  able  to  handle  subscripts  and  superscripts  as 
shown  below: 

INPUT: 

niiiMiioi  cof.'iiuu-imiiMii  (").  Ecnii  no-iysc'iuiiac’  iiawii  yKcuepijMeinaii.iiMe 

OUTPUT : 

5imis4  soedineni4irii  (13).  Esli  polHevnue  no  mi  3ksper;imevtal 6vye 
INPUT: 

Typoii  Tima  (Ji.uoopHTa  ofimeii  cjiopMy.nj  Nd,-*Te,0,4*Fi-x-  Onii  cvmecT 
OUTPUT : 

turol  tipa  f lhor ito  ob5el  formuly  Ndl-xTexOl+xFl-x.  Oni  subest 

Superscripts  and  subscripts  are  somewhat  more  difficult  than 
normal  characters  in  the  main  corpus  font  because  of  their  lack 
of  definition.  The  three  pictures  below  are  subscripts  as  seen 
by  the  scanner  at  3  by  4  mils  resolution. 
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DATA  GENERAL 


ITEM 


DG/8611-I 


DG/8615 

DG/8614 

DG/1012P 


DG/4075 
DG/ 4078 
DG/6052-D 
DG/6086 
DG/4079 


DG/ 6070 


DG/ECRM4  500 


QTY  DESCRIPTION 

HARDWARE: 

CPU  and  Related  Items 

1  Eclipse  S/130  computer  with  64KB  MOS  memory, 

battery  backup  and  ERCC 

1  Writable  control  store 

1  Character  Instruction  Set 

1  One-bay  cabinet  (240V  @  50A) 

Console,  Editing  Terminals  and  Related  Items 

2  I/O  Interface  Subassembly 

2  EIA  (RS232C)  Interface 

1  CRT  Display  Terminal  (24X80) 

1  180  cps,  bidirectional  9X7  dot  matrix  printer 

1  Real  Time  Clock 

Disc  Subsystem  and  Related  Items 

1  20  Mbyte  DGC  Cartridge  Disc 

Subsystem,  (10  fixed,  10  removable) 
includes  cartridge. 

Interface 

1  Direct  memory  access:  DG/8611-I  and  ECRM/4500 

Control:  DG/8611-I  and  ECRM/4500 


SOFTWARE : 

Eclipse  Stand-Alone  Operating  System, 
Assembler,  Macro-Assembler,  Fortrain  IV, 

Basic  Algol  Compilers  and  System  Utilities 

Eclipse  RDOS  Operating  System  Assembler, 

Macro  Assembler  Fortran  IV,  Basic  Algol 
Compilers,  Sort/Merge,  CSP  &  System  Utilities 

Eclipse  RTOS  Real  Time  Operating  System 

Eclipse  Operating  System  for  Eclipse  Series 

Diagnostic  Operating  System  for  Peripherals 


ECRM 


/ 


ITEM 


QTY 


DESCRIPTION 


HARDWARE : 


ECRM/4500  1  4000  Series  Autoreader 


D2 


